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DETAILED ACTION 

Response to Arguments 

1 . Applicant's arguments filed 05/28/08 have been fully considered but they are not 
persuasive. 

Applicant argues that neither Pitman nor Ellis nor jiang teach or suggest 
calculating peak values of spectra of the audio signal corresponding to values at 
respective peaks of the band spectra from the band spectra of the audio signal, and 
obtaining, as the prescribed feature quantities, values of difference between peak 
values of frequency bands, each of the peak values being of a greatest spectrum 
strength among local maximums of each of the band spectra (Amendment, pages 13, 
and 14). 

The examiner disagrees, Ellis et al., teach "utilizing fast Fourier transform 
process to generate audio frame signatures. Since matches can occur on several 
consecutive frames, each match (audio and video) has a peak width associated 
therewith. The number of such consecutively detected matches is referred to as the 
peak width; examines the run structure in the segment signature and generates an 
anticipated peak with value therefrom" (col. 19, lines 25 - 28; col. 31 , lines 23 - 25; 
col.45, lines 25 - 30). Generating a peak width value from different audio frames 
signatures implies calculating peak values of spectra of the audio signal corresponding 
to values at respective peaks of the band spectra from the band spectra of the audio 
signal, and obtaining, as the prescribed feature quantities, values of difference between 
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peak values of frequency bands, each of the peak values being of a greatest spectrum 
strength among local maximums of each of the band spectra, since the consecutive 
frames contain a plurality of peaks, and the peak width is associated with the generated 
audio frame signatures. 

Claim Rejections - 35 USC § 103 

2. The text of those sections of Title 35, U.S. Code not included in this action can 
be found in a prior Office action. 

Claims 6- 13, 31 -42 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Pitman et al., (US PAP 2002/0143530) in view Ellis et al., (US Patent 
5,504,51 8), and further in view jiang et al., (US Patent 6,901 ,362). 

As per claim 6, Pitman et al., a feature quantity extracting apparatus comprising: 
a frequency transforming section for performing a frequency transform on a signal 
portion corresponding to a prescribed time length, which is contained in an inputted 
audio signal, to derive a frequency spectrum from the signal portion ("the audio signal is 
sampled and a frequency transform is performed on a succession of set of samples"; 
Abstract, lines 1 -3; paragraph 30, lines 5 - 7); 

a band extracting section for extracting a plurality of frequency bands from the 
frequency spectrum derived by the frequency transforming section and for outputting 
band spectra which are respective frequency spectra of the extracted frequency bands 
("frequency bands"; Abstract, lines 5, and 6; paragraph 32, line 11); and 
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a feature quantity calculating section for calculating respective prescribed 
feature quantities of the band spectra, the feature quantity calculating section obtaining 
the calculated prescribed feature quantities as feature quantities of the audio signal 
("extract features from unknown audio content"; paragraph 33; paragraph 54). 

However, Pitman et al., do not specifically teach that the feature quantity 
calculating peak values of spectra of the audio signal corresponding to values at 
respective peaks of the band spectra from the band spectra of the audio signal, and 
obtaining, as the prescribed feature quantities, values of difference between peak 
values of frequency bands, each of the peak values being of a greatest spectrum 
strength among local maximums of each of the band spectra. 

Ellis et al., teach utilizing fast Fourier transform process to generate audio frame 
signatures. Since matches can occur on several consecutive frames, each match 
(audio and video) has a peak width associated therewith. The number of such 
consecutively detected matches is referred to as the peak width; examines the run 
structure in the segment signature and generates an anticipated peak with value 
therefrom" (col. 19, lines 25-28; col.31, lines 23-25; col.45, lines 25-30). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to generate a peak value among consecutive frames as 
taught by Ellis et al., in Pitman et al., because that would help better identify the audio 
content with high accuracy (col.4, lines 5 - 7). 
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However, Pitman et al., in view of Ellis et al., do not specifically teach each of the 
peak values being of a greatest spectrum strength among local maximums of each of 
the band spectra. 

Jiang et al., teach that the maximum local peak of the correlation function for 
each band is then located in a conventional manner (col.1 1, lines 10-12). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to locate the larger of the maximum local peak as taught 
by Jiang et al., in Pitman et al., in view of Ellis et al., because that would provide 
improved segmentation and classification of audio signals (col.1, lines 41 -43). 

As per claim 7, Pitman et al., in view of Ellis et al., and further in view of Jiang et 
al., further disclose the feature quantity calculating section uses binary values to 
represent the values of difference between peak values of frequency bands, the binary 
values indicating a sign of a corresponding one of the values of difference (Ellis et al.; 
"binary value"; col.1 5, lines 37 - 40). 

As per claim 8, Pitman et al., a feature quantity extracting apparatus comprising: 
a frequency transforming section for performing a frequency transform on a signal 
portion corresponding to a prescribed time length, which is contained in an inputted 
audio signal, to derive a frequency spectrum from the signal portion ("the audio signal is 
sampled and a frequency transform is performed on a succession of set of samples"; 
Abstract, lines 1 -3; paragraph 30, lines 5 - 7); 
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a band extracting section for extracting a plurality of frequency bands from the 
frequency spectrum derived by the frequency transforming section and for outputting 
band spectra which are respective frequency spectra of the extracted frequency bands 
("frequency bands"; Abstract, lines 5, and 6; paragraph 32, line 11); and 

a feature quantity calculating section for calculating respective prescribed 
feature quantities of the band spectra, the feature quantity calculating section obtaining 
the calculated prescribed feature quantities as feature quantities of the audio signal 
("extract features from unknown audio content"; paragraph 33; paragraph 54). 

However, Pitman et al., do not specifically teach that the feature quantity 
calculating peak values of spectra of the audio signal corresponding to values at 
respective peaks of the band spectra from the band spectra of the audio signal, and 
obtaining, as the prescribed feature quantities, values of difference between peak 
values of frequency bands, each of the peak values being of a greatest spectrum 
strength among local maximums of each of the band spectra. 

Ellis et al., teach utilizing fast Fourier transform process to generate audio frame 
signatures. Since matches can occur on several consecutive frames, each match 
(audio and video) has a peak width associated therewith. The number of such 
consecutively detected matches is referred to as the peak width; examines the run 
structure in the segment signature and generates an anticipated peak with value 
therefrom" (col. 19, lines 25-28; col.31, lines 23-25; col.45, lines 25-30). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to generate a peak value among consecutive frames as 
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taught by Ellis et al., in Pitman et al., because that would help better identify the audio 
content with high accuracy (col.4, lines 5 - 7). 

However, Pitman et al., in view of Ellis et al., do not specifically teach each of the 
peak values being of a greatest spectrum strength among local maximums of each of 
the band spectra. 

Jiang et al., teach that the maximum local peak of the correlation function for 
each band is then located in a conventional manner (col.1 1, lines 10-12). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to locate the larger of the maximum local peak as taught 
by Jiang et al., in Pitman et al., in view of Ellis et al., because that would provide 
improved segmentation and classification of audio signals (col . 1 , lines 41 - 43). 

As per claim 9, Pitman et al., in view of Ellis et al., and further in view of Jiang et 
al., further disclose that the feature quantity calculating section calculates, as the 
prescribed feature quantities, values of difference between peak frequencies of 
frequency bands (Ellis et al., "detect multiple matches on a given key signature for 
consecutive frames, and generate an anticipated peak value"; col.45, lines 25 - 31). 

As per claim 10, Pitman et al., in view of Ellis et al., and further in view of Jiang et 
al., further disclose that the feature quantity calculating section represents the 
prescribed feature quantities using binary values indicating whether a corresponding 
one of the values of difference between peak frequencies of frequency bands is greater 
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than a prescribed value (Ellis et al.; "one binary value for positive elements"; col. 15, 
lines 37-40). 

As per claim 1 1 , Pitman et al., a feature quantity extracting apparatus 
comprising: a frequency transforming section for performing a frequency transform on a 
signal portion corresponding to a prescribed time length, which is contained in an 
inputted audio signal, to derive a frequency spectrum from the signal portion ("the audio 
signal is sampled and a frequency transform is performed on a succession of set of 
samples"; Abstract, lines 1 -3; paragraph 30, lines 5 - 7); 

a band extracting section for extracting a plurality of frequency bands from the 
frequency spectrum derived by the frequency transforming section and for outputting 
band spectra which are respective frequency spectra of the extracted frequency bands 
("frequency bands"; Abstract, lines 5, and 6; paragraph 32, line 11); and 

a feature quantity calculating section for calculating respective prescribed 
feature quantities of the band spectra, the feature quantity calculating section obtaining 
the calculated prescribed feature quantities as feature quantities of the audio signal 
("extract features from unknown audio content"; paragraph 33; paragraph 54). 

the frequency transforming section extracts from the audio signal the signal 
portion corresponding to a prescribed time length at prescribed time intervals ("the 
audio signal is sampled and a frequency transform is performed on a succession of set 
of samples"; Abstract, lines 1 -3; paragraph 30, lines 5 - 7). 



Application/Control Number: 10/667,465 Page 9 

Art Unit: 2626 

However, Pitman et al., do not specifically teach that the feature quantity 
calculating peak values of spectra of the audio signal corresponding to values at 
respective peaks of the band spectra from the band spectra of the audio signal, and 
obtaining, as the prescribed feature quantities, values of difference between peak 
values of frequency bands, each of the peak values being of a greatest spectrum 
strength among local maximums of each of the band spectra. 

Ellis et al., teach utilizing fast Fourier transform process to generate audio frame 
signatures. Since matches can occur on several consecutive frames, each match 
(audio and video) has a peak width associated therewith. The number of such 
consecutively detected matches is referred to as the peak width; examines the run 
structure in the segment signature and generates an anticipated peak with value 
therefrom" (col. 19, lines 25-28; col.31, lines 23-25; col.45, lines 25-30). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to generate a peak value among consecutive frames as 
taught by Ellis et al., in Pitman et al., because that would help better identify the audio 
content with high accuracy (col.4, lines 5 - 7). 

However, Pitman et al., in view of Ellis et al., do not specifically teach each of the 
peak values being of a greatest spectrum strength among local maximums of each of 
the band spectra. 

Jiang et al., teach that the maximum local peak of the correlation function for 
each band is then located in a conventional manner (col. 11, lines 10-12). 
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Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to locate the larger of the maximum local peak as taught 
by Jiang et al., in Pitman et al., in view of Ellis et al., because that would provide 
improved segmentation and classification of audio signals (col.1, lines 41 -43). 

As per claim 12, Pitman et al., in view of Ellis et al., and further in view of Jiang et 
al., further disclose that the peak frequency time variation calculating section obtains, as 
the prescribed feature quantities, binary values indicating a sign of a corresponding one 
of the time variation quantities of the peak frequencies (Ellis et al.; "binary value"; col. 15, 
lines 37-40). 

As per claim 1 3, Pitman et al., in view of Ellis et al., and further in view of Jiang et 
al., further disclose that the peak frequency time variation calculating section obtains, as 
the prescribed feature quantities, binary values indicating whether a corresponding one 
of the time variation quantities of the peak frequencies is greater than a prescribed 
value (Ellis et al.; "one binary value for positive elements"; col. 15, lines 37 - 40). 

As per claims 31 - 33, 35 -37, and 39 -41 , Pitman et al., in view of Ellis et al., 
and further in view of Jiang et al., further disclose a recording medium, and reproduction 
medium; and a feature quantity storage section which stores at least a set of a feature 
quantity of an audio signal and control instruction information associated therewith, 
(Pittman et al.; paragraph 54; paragraph 25). 
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Pitman et al., in view of Ellis et al., and further in view of Jiang et al., do not 
specifically teach receiving television program data containing an audio signal and a 
video signal, and is capable of recording the television program data to a recording 
medium, wherein the feature quantity extracting apparatus obtains a feature quantity of 
the audio signal contained in the television program data, wherein the program 
recording apparatus further comprises: a recording control section for controlling 
recording of the television program data to the recording medium; the audio signal 
containing music played in a television program to be recorded, the control instruction 
information instructing the recording control section to perform or stop recording of the 
television program; a feature quantity comparison section for determining whether the 
audio signal contained in the television program data matches with the audio signal 
containing the music played in the television program based on both the feature quantity 
obtained by the feature quantity extracting apparatus and the feature quantity stored in 
the feature quantity storage section, and wherein when the feature quantity comparison 
section determines that the audio signal contained in the television program data 
matches with the audio signal containing the music played in the television program, the 
recording control section performs the control of performing or stopping recording of the 
television program data to the recording medium in accordance with an instruction 
indicated by control instruction information which is stored in the feature quantity 
storage section and associated with a feature quantity of the audio signal having been 
determined as matching with the audio signal containing the music played in the 
television program. 
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However, since Ellis et al., teach receiving television broadcast signals over a 
respective channel and demodulates the received signals to provide baseband video 
and audio signals. The video and audio signals are thereafter supplied to the segment 
recognition subsystem wherein frames signatures for each of the video and audio 
signals are generated which are thereafter compared to store key signatures to 
determine if a match exists (col. 9, lines 55 - 62). The FIR module serve to improve 
signature stability by averaging the audio spectral data over a number of television 
frames, thus to enhance the likelihood of obtaining correct signatures matches (col. 21, 
lines 64 - 67). One having ordinary skill in the art at the time the invention was made 
would have found it obvious to use extracting features to match audio and video in a 
television broadcast, because as taught that would help determine what programs, 
songs or other works have been broadcast (Ellis et al.; col.1 0, lines 1-12). 

As per claim 34, 38, and 42 Pitman et al., in view of Ellis et al., and further in 
view of Jiang et al., further disclose that the program reproduction control apparatus 
further comprises an editing section capable of editing the television program data 
recorded in the recording medium (Ellis et al.; "updating a broadcast segment 
recognition database storing signatures"; col.5, lines 2, and 3). 

Conclusion 

3. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .136(a). 
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A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

4. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to LEONARD SAINT CYR whose telephone number is 
(571) 272-4247. The examiner can normally be reached on Mon- Friday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is (571)- 
273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

LS 

08/09/08 

/Michael N. Opsasnick/ 
Primary Examiner, Art Unit 2626 



